Small hardware implementation of the subbyte function of rijndael

ABSTRACT

A small hardware implementation is provided for the Advanced Encryption Standard SubByte function that implements the affine transform and inverse transform in a single Affine-All transform using a multiplicative inverse ROM. The logic is greatly reduced and the maximum path delay is reduced compared to a multiplexor implementation and is slightly greater than a ROM implementation.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. provisional application Ser.No. 60/433,365 filed Dec. 13, 2002; and 60/473,527 filed May 27, 2003,which are incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to the field of data encryption. Theinvention relates particularly to an apparatus and method for a smallhardware implementation of the SubByte function found in the AdvancedEncryption Standard (AES) algorithm or Rijndael Block Cipher,hereinafter AES/Rijndael. The accommodating is redesigned to work withboth inverse and normal processing.

DISCUSSION OF THE RELATED ART

The current state of the art provides for hardware implementations wherethe inverse cipher can only partially re-use the circuitry thatimplements the cipher. For high-speed networking processors and SmartCard applications a very small (gate size) and high data-rate(accommodating an Optical Carrier Rate of OC-192 and beyond 9953.28 Mbpsand a payload of 9.6 Gbps) are desirable.

The AES/Rijndael is an iterataed block cipher and is described in aproposal written by Joan Daemen and Vincent Rijmen and published in Mar.9, 1999. The National Institute of Standards and Technology (NIST) hasapproved the AES/Rijndael as a cryptographic algorithm and published theAES/Rijndael in Nov. 26, 2001 (Publication 197 also known as FederalInformation Processing Standard 197 or “FIPS 197”) which is herebyincorporated by reference as if fully set forth herein). In accordancewith many private key encryption/decryption algorithms, includingAES/Rijndael, encryption/decryption is performed in multiple stages,commonly known as iterations or rounds. Such algorithms lend themselvesto a data processing pipeline or pipelines architecture. In each round,the AES/Rijndael uses the affine transformation and its inverse alongwith other transformations to decrypt (decipher) and encrypt (encipher)information. Encryption converts data to an unintelligible form calledcipher text; decrypting the ciphertext converts the data back into itsoriginal form, called plaintext.

The input and output for the AES/Rijndael algorithm each consist ofsequences of 128 bits (each having a value of 0 or 1). These sequencesare commonly be referred to as blocks and the number of bits theycontain are referred to as their length (“FIPS 197”, NIST, p. 7). Thebasic unit for processing in the AES/Rijndael algorithm is a byte, asequence of eight bits treated as a single entity with most significantbit (MSB) on the left. Internally, the AES/Rijndael algorithm'soperations are performed on a two dimensional array of bytes called theState. The State consists of four rows of bytes, each containing Nbbytes, where Nb is the block length divided by 32 (“FIPS 197”, NIST, p.9).

At the start of the Cipher and Inverse Cipher (encryption anddecryption), the input—the array of bytes

-   -   in0, in1, . . . in15        is copied into the State array as illustrated in FIG. 1. The        Cipher or Inverse Cipher operations are then conducted on each        byte in this State array, after which its final values are        copied to the output—the array of bytes    -   out0, out1, . . . out15.        The addition of two elements in a finite field is achieved by        “adding” the coefficients for the corresponding powers in the        polynomials for the two elements. The addition is performed with        the boolean exclusive XOR operation (“FIPS 197”,NIST,p 10). The        binary notation for adding two bytes is:        {01010111}⊕{10000011}={11010100}  (1.0)        In the polynomial representation, multiplication in GF(2⁸)        corresponds with the multiplication of polynomials modulo an        irreducible polynomial of degree 8. A polynomial is irreducible        if its only divisors are one and itself. For the AES/Rijndael        algorithm, this irreducible polynomial is (“FIPS 197”, NIST,        p.10):        m(x)=x ⁸ +x ⁴ +x ³ +x+1  (1.1)

A diagonal matrix with each diagonal element equal to 1 is called anidentity matrix. The n×n identity matrix is denoted In:

$\begin{matrix}{I_{n} = \begin{bmatrix}1 & 0 & 0 & 0 & 0 \\0 & 1 & 0 & 0 & 0 \\0 & 0 & 1 & 0 & 0 \\0 & 0 & 0 & 1 & 0 \\0 & 0 & 0 & 0 & 1\end{bmatrix}} & (1.2)\end{matrix}$

If A and B and n×n matrices, we call each an inverse of the other if:AB=BA=I_(n)  (1.3)

A transformation consisting of multiplication by a matrix followed bythe addition of a vector is called an Affine Transformation.

The SubByte( ) function of AES/Rijndael is a non-linear bytesubstitution that operates independently on each byte of the State usinga substitution table (S-box). This S-box, which is invertible, isconstructed by composing two transformations:

-   -   1. Take the multiplicative inverse in the finite field GF(2⁸),        described earlier; the element {00} is mapped to itself.    -   2. Apply the following affine transformation (over GF(2)):        bi′=b _((i)mod8) ⊕b _((i+4)mod8) ⊕b _((i+5)mod8) ⊕b _((i+6)mod8)        ⊕b _((i+7)mod8) ⊕c _(i)  (1.4)

In matrix form, the affine transformation element of the S-box can beexpressed as (“FIPS 197”,NIST,p16):

$\begin{matrix}{\begin{bmatrix}b_{0}^{\prime} \\b_{1}^{\prime} \\b_{2}^{\prime} \\b_{3}^{\prime} \\b_{4}^{\prime} \\b_{5}^{\prime} \\b_{6}^{\prime} \\b_{7}^{\prime}\end{bmatrix} = {{\begin{bmatrix}1 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\1 & 1 & 0 & 0 & 0 & 1 & 1 & 1 \\1 & 1 & 1 & 0 & 0 & 0 & 1 & 1 \\1 & 1 & 1 & 1 & 0 & 0 & 0 & 1 \\1 & 1 & 1 & 1 & 1 & 0 & 0 & 0 \\0 & 1 & 1 & 1 & 1 & 1 & 0 & 0 \\0 & 0 & 1 & 1 & 1 & 1 & 1 & 0 \\0 & 0 & 0 & 1 & 1 & 1 & 1 & 1\end{bmatrix}\begin{bmatrix}b_{0} \\b_{1} \\b_{2} \\b_{3} \\b_{4} \\b_{5} \\b_{6} \\b_{7}\end{bmatrix}} + {\begin{bmatrix}1 \\1 \\0 \\0 \\0 \\1 \\1 \\0\end{bmatrix}.}}} & (1.5)\end{matrix}$

If this were implemented as the lookup table as suggested by theAES/Rijndael proposal, a 256 entry ROM or multiplexor would be required.To implement the AES/Rijndael algorithm, 12 instantiations of this tablewould be required. The inverse of this matrix can be found as:

$\begin{matrix}{\begin{bmatrix}b_{0}^{\prime} \\b_{1}^{\prime} \\b_{2}^{\prime} \\b_{3}^{\prime} \\b_{4}^{\prime} \\b_{5}^{\prime} \\b_{6}^{\prime} \\b_{7}^{\prime}\end{bmatrix} = {{\begin{bmatrix}0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 \\1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 \\0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 \\0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 \\0 & 0 & 1 & 0 & 1 & 0 & 0 & 1 \\1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\0 & 1 & 0 & 0 & 1 & 0 & 1 & 0\end{bmatrix}\begin{bmatrix}b_{0} \\b_{1} \\b_{2} \\b_{3} \\b_{4} \\b_{5} \\b_{6} \\b_{7}\end{bmatrix}} + \begin{bmatrix}1 \\0 \\1 \\0 \\0 \\0 \\0 \\0\end{bmatrix}}} & (1.6)\end{matrix}$If this was implemented as the lookup table suggested by theAES/Rijndael proposal, a 128-entry, 16-bit word ROM or multiplexor wouldbe required. To implement the AES/Rijndael algorithm, 12 instantiationsof this table would be required.

Thus there is a need for a system and a method of sharing almost all thecircuitry for the affine transformation in order to reduce gate count.To achieve a high data-rate and small gate size the design must bearchitected so that the maximum path is not significantly longer and thegate size is so small that the design can be replicated to promoteparallel processing without greatly increasing the die size. Increasingdie size adds more expense and power consumption, making the productless marketable. The present invention is an apparatus and a method fordecreasing the gate size and at the expense of slightly increasing themaximum path delay. This makes the circuit smaller and thus moreattractive for high data-rate designs.

Each occurrence in the AES/Rijndael of the pair of affine transform andinverse affine transform is reduced by the present invention to onetransform, the Affine-All transform. In a preferred embodiment, acircuit performs both normal and inverse affine transformations withvery little duplicate logic. In this preferred embodiment, byimplementing the Affine-All transform with a Multiplicative Inverse ROM,the logic is greatly reduced and the maximum path delay is reducedcompared to a multiplexor implementation while only being slightlygreater than for a ROM implementation

Thus, the preferred embodiment of the present invention employs aread-only memory (ROM) for the multiplicative inverse and a reducedcombinational logic implementation for the affine transformation. Thisimplementation is very low in gate count with a very comparable maximumdelay path.

FIG. 1 illustrates state array input and output (“FIPS 197”, nist, p.9)

FIG. 2 illustrates comparison of prior art ROM and lookup table(multiplexor) implementation of the subbyte function with Affine-Allimplementation of the present invention.

FIG. 3 illustrates the ROM or lookup table used with the Affine-Alltransformation of the present invention.

FIG. 4 illustrates the netlist of the Affine-All combinational logic.

The present invention is based, in part, on the fact that beginning atthe last row each row of matrix equations (1.5) and (1.6) is shiftedleft by one bit from the previous row. In the present invention, thefirst row of each matrix is termed the “load pattern”. So the “loadpattern” for the affine transform matrix is {10001111} and the “loadpattern” for the inverse affine transform is {00100101}. Note that thenumber of 0's in each “load pattern” is an odd number and is animportant characteristic in being able to merge the two transformationsinto one circuit in the system and method of the present invention.

If both affine transformations are implemented as suggested by Daemenand Rijmen (“FIPS 197”) using exclusive OR gates the circuit equationslook as follows:

Affine Transform Equationsb′₀=5(b₀ρb₄ρb₅ρb₆ρb₇)b′₁=5(b₀ρb₁ρb₅ρb₆ρb₇)b′₂=(b₀ρb₁ρb₂ρb₆ρb₇)b′₃=(b₀ρb₁ρb₂ρb₃ρb₇)b′₄=(b₀ρb₁ρb₂ρb₃ρb₄)b′₅=5(b₁ρb₂ρb₃ρb₄ρb₅)b′₆=5(b₂ρb₃ρb₄ρb₅ρb₆)b′₇(b₃ρb₄ρb₅ρb₆ρb₇)  (1.7)Notice that each equation has an odd number of terms and the same numberof terms: five. The addition of the vector determines the negation ofsome equations. So the number of terms in each equation is determined bythe “load pattern”. The number of negations is determined by theaddition of the vector which is termed the “load vector”.Inverse Affine Transform Equationsb′₀=5(b₂ρb₅ρb₇)b′₁=(b ₀ρb₃ρb₆)b′₂=5(b₁ρb₄ρb₇)b′₃=(b₀ρb₂ρb₅)b′₄=(b₁ρb₃ρb₆)b′₅=(b₂ρb₄ρb₇)b′₆=(b₀ρb₃ρb₅)b′₇=(b₁ρb₄ρb₆)Each equation has an odd number of terms and the same number of terms:three. The addition of the vector determines the negation of someequations. So the number of terms in each equation is determined by the“load pattern”. The number of negations is determined by the addition ofthe vector.

This addition vector can now be used as a “load vector” as well. Lookingat the two sets of equations it appears that there is no common logic tobe merged. If the equations are rewritten with the “load pattern”included and use the addition of the vector to determine the negations,a common circuit is revealed. The properties of the exclusive OR areused to accomplish this and these properties are:A⊕B⊕C=C⊕B⊕A  (1.9)A⊕0=A  (2.0)A⊕1=

A  (2.1)A⊕A=0  (2.2)

In a preferred embodiment, the circuit implementing both the affine andinverse affine transforms comprises a Multiplicative Inverse ROM and thelogic that represents both transforms is as follows with p as the “loadpattern” and v as the “load vector”. For example, here is what equationseven of the affine matrix becomes:b′ ₇=[(b ₀ ≅p ₁)ρ(b ₁ ≅p ₂)ρ(b ₂ ≅p ₃)ρ(b ₃ ≅p ₄)ρ(b ₄ ≅p ₅)ρ(b ₅ ≅p₆)ρ(b ₆ ≅p ₇)ρ(b ₇ ≅p ₀)]ρv ₇  (2.3)

The number of instantiations has been cut in half. Because of the 0'sproduced by the ANDing of p and b, the equation works for both affineand inverse affine transformations. Because b XOR'ed with a 1 is alwaysthe inverse of b, using v₇ each time negates the equation whereappropriate.

Comparisons:

Using the design suggested by the AES/Rijndael proposal (FIPS 197)implemented in two ways:

(1) a 128-entry, 16-bit word ROM, and

(2) a 128-entry, 16-bit word lookup table implemented as a multiplexor,the ROM, Multiplexor and the Affine-All circuit embodiment of thepresent invention were synthesized and timed using maximum pathanalysis. FIG. 2 compares results where sizes in gates are given as wellas sizes in microns for comparison with the ROM implementation. Net areais not considered because wire load models differ with technologies.

A preferred embodiment of the ROM or Lookup table contains the valuesshown in FIG. 3, in hexadecimal format.

The net list of the Affine-All combinational logic of a preferredembodiment is shown in FIG. 4. The code for an implementation isincluded as Appendix A.

The present invention is applicable to all systems and devices capableof secure communications, comprising security networking processors,secure keyboard devices, magnetic card reader devices, smart card readerdevices, and wireless 802.11 devices.

The above describe embodiments are only typical examples, and theirmodifications and variations are apparent to those skilled in the art.Various modifications to the above-described embodiments can be madewithout departing from the scope of the invention as embodied in theaccompanying claims.

Appendix A

The RTL to implement the affine all circuit is shown below:

‘timescale 10ns/10ns module aes_affine_all ( byteOut, // output bytebyteIn, // input byte enCrypt // 1 = encrypt 0 = decrypt ); //--------------------------------------------------------------------- //ports //---------------------------------------------------------------------input enCrypt; input [7:0] byteIn; output [7:0] byteOut; // Logicreduction wire [4:0] byteOut_int; wire [0:7] y_inv,y,y_int; wandbyteOut_7_0,byteOut_7_1,byteOut_7_2,byteOut_7_3,byteOut_7_4,byteOut_7_5,byteOut_7_6,byteOut_7_7; wandbyteOut_4_0,byteOut_4_1,byteOut_4_2,byteOut_4_3,byteOut_4_4,byteOut_4_5,byteOut_4_6,byteOut_4_7; wandbyteOut_int_4_0,byteOut_int_4_1,byteOut_int_4_2,byteOut_int_4_3,byteOut_int_4_4,byteOut_int_4_5, byteOut_int_4_6, byteOut_int_4_7; wandbyteOut_int_3_0,byteOut_int_3_1,byteOut_int_3_2,byteOut_int_3_3,byteOut_int_3_4,byteOut_int_3_5,byteOut_int_3_6, byteOut_int_3_7; wandbyteOut_3_0,byteOut_3_1,byteOut_3_2,byteOut_3_3,byteOut_3_4,byteOut_3_5,byteOut_3_6,byteOut_3_7; wandbyteOut_int_2_0,byteOut_int_2_1,byteOut_int_2_2,byteOut_int_2_3,byteOut_int_2_4,byteOut_int_2_5,byteOut_int_2_6, byteOut_int_2_7; wandbyteOut_int_1_0,byteOut_int_1_1,byteOut_int_1_2,byteOut_int_1_3,byteOut_int_1_4,byteOut_int_1_5,byteOut_int_1_6, byteOut_int_1_7; wandbyteOut_int_0_0,byteOut_int_0_1,byteOut_int_0_2,byteOut_int_0_3,byteOut_int_0_4,byteOut_int_0_5,byteOut_int_0_6, byteOut_int_0_7; assign y_inv =8’b00100101; assign y = 8’b10001111; assign y_int = (enCrypt) ? y :y_inv; assign byteOut_7_0 = byteIn [0]; assign byteOut_7_0 = y_int[1];assign byteOut_7_1 = byteIn [1]; assign byteOut_7_1 = y_int[2]; assignbyteOut_7_2 = byteIn [2]; assign byteOut_7_2 = y_int[3]; assignbyteOut_7_3 = byteIn [3]; assign byteOut_7_3 = y_int[4]; assignbyteOut_7_4 = byteIn [4]; assign byteOut_7_4 = y_int[5]; assignbyteOut_7_5 = byteIn [5]; assign byteOut_7_5 = y_int[6]; assignbyteOut_7_6 = byteIn [6]; assign byteOut_7_6 = y_int[7]; assignbyteOut_7_7 = byteIn [7]; assign byteOut_7_7 = y_int[0]; assign byteOut[7] = byteOut_7_0{circumflex over ( )} byteOut_7_1{circumflex over ( )}byteOut_7_2{circumflex over ( )} byteOut_7_3{circumflex over ( )}byteOut_7_4{circumflex over ( )} byteOut_7_5{circumflex over ( )}byteOut_7_6{circumflex over ( )} byteOut_7_7; assign byteOut_int_4_0 =byteIn [0]; assign byteOut_int_4_0 = y_int[2]; assign byteOut_int_4_1 =byteIn [1]; assign byteOut_int_4_1 = y_int[3]; assign byteOut_int_4_2 =byteIn [2]; assign byteOut_int_4_2 = y_int[4]; assign byteOut_int_4_3 =byteIn [3]; assign byteOut_int_4_3 = y_int[5]; assign byteOut_int_4_4 =byteIn [4]; assign byteOut_int_4_4 = y_int[6]; assign byteOut_int_4_5 =byteIn [5]; assign byteOut_int_4_5 = y_int[7]; assign byteOut_int_4_6 =byteIn [6]; assign byteOut_int_4_6 = y_int[0]; assign byteOut_int_4_7 =byteIn [7]; assign byteOut_int_4_7 = y_int[1]; assign byteOut_int [4] =byteOut_int_4_0{circumflex over ( )} byteOut_int_4_1{circumflex over( )} byteOut_int_4_2{circumflex over ( )} byteOut_int_4_3{circumflexover ( )} byteOut_int_4_4{circumflex over ( )}byteOut_int_4_5{circumflex over ( )} byteOut_int_4_6{circumflex over( )} byteOut_int_4_7; assign byteOut_int_3_0 = byteIn [0]; assignbyteOut_int_3_0 = y_int[3]; assign byteOut_int_3_1 = byteIn [1]; assignbyteOut_int_3_1 = y_int[4]; assign byteOut_int_3_2 = byteIn [2]; assignbyteOut_int_3_2 = y_int[5]; assign byteOut_int_3_3 = byteIn [3]; assignbyteOut_int_3_3 = y_int[6]; assign byteOut_int_3_4 = byteIn [4]; assignbyteOut_int_3_4 = y_int[7]; assign byteOut_int_3_5 = byteIn [5]; assignbyteOut_int_3_5 = y_int[0]; assign byteOut_int_3_6 = byteIn [6]; assignbyteOut_int_3_6 = y_int[1]; assign byteOut_int_3_7 = byteIn [7]; assignbyteOut_int_3_7 = y_int[2]; assign byteOut_int [3] =byteOut_int_3_0{circumflex over ( )} byteOut_int_3_1{circumflex over( )} byteOut_int_3_2{circumflex over ( )} byteOut_int_3_3{circumflexover ( )} byteOut_int_3_4{circumflex over ( )}byteOut_int_3_5{circumflex over ( )} byteOut_int_3_6{circumflex over( )} byteOut_int_3_7; assign byteOut_4_0 = byteIn [0]; assignbyteOut_4_0 = y_int[4]; assign byteOut_4_1 = byteIn [1]; assignbyteOut_4_1 = y_int[5]; assign byteOut_4_2 = byteIn [2]; assignbyteOut_4_2 = y_int[6]; assign byteOut_4_3 = byteIn [3]; assignbyteOut_4_3 = y_int[7]; assign byteOut_4_4 = byteIn [4]; assignbyteOut_4_4 = y_int[0]; assign byteOut_4_5 = byteIn [5]; assignbyteOut_4_5 = y_int[1]; assign byteOut_4_6 = byteIn [6]; assignbyteOut_4_6 = y_int[2]; assign byteOut_4_7 = byteIn [7]; assignbyteOut_4_7 = y_int[3]; assign byteOut [4] =byteOut_4_0{circumflex over( )} byteOut_4_1{circumflex over ( )} byteOut_4_2{circumflex over ( )}byteOut_4_3{circumflex over ( )} byteOut_4_4{circumflex over ( )}byteOut_4_5{circumflex over ( )} byteOut_4_6{circumflex over ( )}byteOut_4_7; assign byteOut_3_0 = byteIn [0]; assign byteOut_3_0 =y_int[5]; assign byteOut_3_1 = byteIn [1]; assign byteOut_3_1 =y_int[6]; assign byteOut_3_2 = byteIn [2]; assign byteOut_3_2 =y_int[7]; assign byteOut_3_3 = byteIn [3]; assign byteOut_3_3 =y_int[0]; assign byteOut_3_4 = byteIn [4]; assign byteOut_3_4 =y_int[1]; assign byteOut_3_5 = byteIn [5]; assign byteOut_3_5 =y_int[2]; assign byteOut_3_6 = byteIn [6]; assign byteOut_3_6 =y_int[3]; assign byteOut_3_7 = byteIn [7]; assign byteOut_3_7 =y_int[4]; assign byteOut[3] = byteOut_3_0{circumflex over ( )}byteOut_3_1{circumflex over ( )} byteOut_3_2{circumflex over ( )}byteOut_3_3{circumflex over ( )} byteOut_3_4{circumflex over ( )}byteOut_3_5{circumflex over ( )} byteOut_3_6{circumflex over ( )}byteOut_3_7; assign byteOut_int_2_0 = byteIn [0]; assign byteOut_int_2_0= y_int[6]; assign byteOut_int_2_1 = byteIn [1]; assign byteOut_int_2_1= y_int[7]; assign byteOut_int_2_2 = byteIn [2]; assign byteOut_int_2_2= y_int[0]; assign byteOut_int_2_3 = byteIn [3]; assign byteOut_int_2_3= y_int[1]; assign byteOut_int_2_4 = byteIn [4]; assign byteOut_int_2_4= y_int[2]; assign byteOut_int_2_5 = byteIn [5]; assign byteOut_int_2_5= y_int[3]; assign byteOut_int_2_6 = byteIn [6]; assign byteOut_int_2_6= y_int[4]; assign byteOut_int_2_7 = byteIn [7]; assign byteOut_int_2_7= y_int[5]; assign byteOut_int [2] =(~byteOut_int_2_0 & byteOut_int_2_1|~byteOut_int_2_1 & byteOut_int_2_0){circumflex over ( )}(~byteOut_int_2_2 & byteOut_int_2_3 | ~byteOut_int_2_3 &byteOut_int_2_2){circumflex over ( )} (~byteOut_int_2_4 &byteOut_int_2_5 | ~byteOut_int_2_5 & byteOut_int_2_4){circumflex over( )} (~byteOut_int_2_6&byteOut_int_2_7 |~byteOut_int_2_7&byteOut_int_2_6); assign byteOut_int_1_0 = byteIn [0];assign byteOut_int_1_0 = y_int[7]; assign byteOut_int_1_1 = byteIn [1];assign byteOut_int_1_1 = y_int[0]; assign byteOut_int_1_2 = byteIn [2];assign byteOut_int_1_2 = y_int[1]; assign byteOut_int_1_3 = byteIn [3];assign byteOut_int_1_3 = y_int[2]; assign byteOut_int_1_4 = byteIn [4];assign byteOut_int_1_4 = y_int[3]; assign byteOut_int_1_5 = byteIn [5];assign byteOut_int_1_5 = y_int[4]; assign byteOut_int_1_6 = byteIn [6];assign byteOut_int_1_6 = y_int[5]; assign byteOut_int_1_7 = byteIn [7];assign byteOut_int_1_7 = y_int[6]; assign byteOut_int [1]=byteOut_int_1_0{circumflex over ( )} byteOut_int_1_1{circumflex over( )} byteOut_int_1_2{circumflex over ( )} byteOut_int_1_3{circumflexover ( )} byteOut_int_1_4{circumflex over ( )}byteOut_int_1_5{circumflex over ( )} byteOut_int_1_6{circumflex over( )} byteOut_int_1_7; assign byteOut_int_0_0 = byteIn [0]; assignbyteOut_int_0_0 = y_int[0]; assign byteOut_int_0_1 = byteIn [1]; assignbyteOut_int_0_1 = y_int[1]; assign byteOut_int_0_2 = byteIn [2]; assignbyteOut_int_0_2 = y_int[2]; assign byteOut_int_0_3 = byteIn [3]; assignbyteOut_int_0_3 = y_int[3]; assign byteOut_int_0_4 = byteIn [4]; assignbyteOut_int_0_4 = y_int[4]; assign byteOut_int_0_5 = byteIn [5]; assignbyteOut_int_0_5 = y_int[5]; assign byteOut_int_0_6 = byteIn [6]; assignbyteOut_int_0_6 = y_int[6]; assign byteOut_int_0_7 = byteIn [7]; assignbyteOut_int_0_7 = y_int[7]; assign byteOut_int [0]=byteOut_int_0_0{circumflex over ( )} byteOut_int_0_1 {circumflex over( )} byteOut_int_0_2{circumflex over ( )} byteOut_int_0_3{circumflexover ( )} byteOut_int_0_4{circumflex over ( )}byteOut_int_0_5{circumflex over ( )} byteOut_int_0_6{circumflex over( )} byteOut_int_0_7; assign byteOut [6] = (enCrypt) ? ~byteOut_int[4]:byteOut_int[4]; assign byteOut [5] = (enCrypt) ? ~byteOut_int[3]:byteOut_int[3]; assign byteOut [2] = (enCrypt) ? byteOut_int [2] :~byteOut_int [2]; assign byteOut [1] = (enCrypt) ? ~byteOut_int[1]:byteOut_int[1]; assign byteOut [0] = ~byteOut_int [0]; endmodule

1. An apparatus for performing a SubByte function of the Rijndael BlockCipher, comprising: an S-box circuit including an inverse transformationcircuit having a lookup table and being configured and arranged totransform an input using a look-up table, wherein the look-up table isthe multiplicative inverse in the finite field GF(2⁸) having {00} mappedto itself, and the look-up table is implemented by a read-only memory(ROM); a combinational logic circuit configured and arranged to performan affine-all transformation that performs both an affine and inverseaffine transformation in response to respective load patterns, whereinthe combinatorial logic circuit implements the equations:b′ ₀=[(b ₀ ·p ₀)⊕(b ₁ ·p ₁)⊕(b ₂ ·p ₂)⊕(b ₃ ·p ₃)⊕(b ₄ ·p ₄)⊕(b ₅ ·p₅)⊕(b ₆ ·p ₆)⊕(b ₇ ·p ₇)]⊕v ₀b′ ₁=[(b ₀ ·p ₇)⊕(b ₁ ·p ₀)⊕(b ₂ ·p ₁)⊕(b ₃ ·p ₂)⊕(b ₄ ·p ₃)⊕(b ₅ ·p₄)⊕(b ₆ ·p ₅)⊕(b ₇ ·p ₆)]⊕v ₁b′ ₂=[(b ₀ ·p ₆)⊕(b ₁ ·p ₇)⊕(b ₂ ·p ₀)⊕(b ₃ ·p ₁)⊕(b ₄ ·p ₂)⊕(b ₅ ·p₃)⊕(b ₆ ·p ₄)⊕(b ₇ ·p ₅)]⊕v ₂b′ ₃=[(b ₀ ·p ₅)⊕(b ₁ ·p ₆)⊕(b ₂ ·p ₇)⊕(b ₃ ·p ₀)⊕(b ₄ ·p ₁)⊕(b ₅ ·p₂)⊕(b ₆ ·p ₃)⊕(b ₇ ·p ₄)]⊕v ₃b′ ₄=[(b ₀ ·p ₄)⊕(b ₁ ·p ₅)⊕(b ₂ ·p ₆)⊕(b ₃ ·p ₇)⊕(b ₄ ·p ₀)⊕(b ₅ ·p₁)⊕(b ₆ ·p ₂)⊕(b ₇ ·p ₃)]⊕v ₄b′ ₅=[(b ₀ ·p ₃)⊕(b ₁ ·p ₄)⊕(b ₂ ·p ₅)⊕(b ₃ ·p ₆)⊕(b ₄ ·p ₇)⊕(b ₅ ·p₀)⊕(b ₆ ·p ₁)⊕(b ₇ ·p ₂)]⊕v ₅b′ ₆=[(b ₀ ·p ₂)⊕(b ₁ ·p ₃)⊕(b ₂ ·p ₄)⊕(b ₃ ·p ₅)⊕(b ₄ ·p ₆)⊕(b ₅ ·p₇)⊕(b ₆ ·p ₀)⊕(b ₇ ·p ₁)]⊕v ₆b′ ₇=[(b ₀ ·p ₁)⊕(b ₁ ·p ₂)⊕(b ₂ ·p ₃)⊕(b ₃ ·p ₄)⊕(b ₄ ·p ₅)⊕(b ₅ ·p₆)⊕(b ₆ ·p ₇)⊕(b ₇ ·p ₀)]⊕v ₇ having p=p₀p₁p₂p₃p₄p₅p₆p₇ a load patternconsisting of {10001111} for the affine transformation and {00100101}for the inverse affine transformation and having v as a loadvector=v₀v₁v₂v₂v₄v₅v₆v₇ consisting of {11000110} for the affinetransformation and {10100000} for the inverse affine transformation. 2.An apparatus for encrypting and decrypting data, comprising: a dataprocessing module arranged to perform a byte substitution, wherein atleast part of said data processing module comprises: a look-up tablewhich is the multiplicative inverse in the finite field GF(2⁸) having{00} mapped to itself, and the look-up table is implemented by aread-only memory (ROM), a storage device for storing the look-up table,and a circuit having shared logic that performs a single transform thataccomplishes an affine and an inverse affine transformation, wherein thecircuit having shared logic implements the equations:b′ ₀=[(b ₀ ·p ₀)⊕(b ₁ ·p ₁)⊕(b ₂ ·p ₂)⊕(b ₃ ·p ₃)⊕(b ₄ ·p ₄)⊕(b ₅ ·p₅)⊕(b ₆ ·p ₆)⊕(b ₇ ·p ₇)]⊕v ₀b′ ₁=[(b ₀ ·p ₇)⊕(b ₁ ·p ₀)⊕(b ₂ ·p ₁)⊕(b ₃ ·p ₂)⊕(b ₄ ·p ₃)⊕(b ₅ ·p₄)⊕(b ₆ ·p ₅)⊕(b ₇ ·p ₆)]⊕v ₁b′ ₂=[(b ₀ ·p ₆)⊕(b ₁ ·p ₇)⊕(b ₂ ·p ₀)⊕(b ₃ ·p ₁)⊕(b ₄ ·p ₂)⊕(b ₅ ·p₃)⊕(b ₆ ·p ₄)⊕(b ₇ ·p ₅)]⊕v ₂b′ ₃=[(b ₀ ·p ₅)⊕(b ₁ ·p ₆)⊕(b ₂ ·p ₇)⊕(b ₃ ·p ₀)⊕(b ₄ ·p ₁)⊕(b ₅ ·p₂)⊕(b ₆ ·p ₃)⊕(b ₇ ·p ₄)]⊕v ₃b′ ₄=[(b ₀ ·p ₄)⊕(b ₁ ·p ₅)⊕(b ₂ ·p ₆)⊕(b ₃ ·p ₇)⊕(b ₄ ·p ₀)⊕(b ₅ ·p₁)⊕(b ₆ ·p ₂)⊕(b ₇ ·p ₃)]⊕v ₄b′ ₅=[(b ₀ ·p ₃)⊕(b ₁ ·p ₄)⊕(b ₂ ·p ₅)⊕(b ₃ ·p ₆)⊕(b ₄ ·p ₇)⊕(b ₅ ·p₀)⊕(b ₆ ·p ₁)⊕(b ₇ ·p ₂)]⊕v ₅b′ ₆=[(b ₀ ·p ₂)⊕(b ₁ ·p ₃)⊕(b ₂ ·p ₄)⊕(b ₃ ·p ₅)⊕(b ₄ ·p ₆)⊕(b ₅ ·p₇)⊕(b ₆ ·p ₀)⊕(b ₇ ·p ₁)]⊕v ₆b′ ₇=[(b ₀ ·p ₁)⊕(b ₁ ·p ₂)⊕(b ₂ ·p ₃)⊕(b ₃ ·p ₄)⊕(b ₄ ·p ₅)⊕(b ₅ ·p₆)⊕(b ₆ ·p ₇)⊕(b ₇ ·p ₀)]⊕v ₇ having p=p₀p₁p₂p₃p₄p₅p₆p₇ as a loadpattern consisting of {10001111} for the affine transformation and{00100101} for the inverse affine transformation and having v as a loadvector=v₀v₁v₂v₃v₄v₅v₆v₇ consisting of {11000110} for the affinetransformation and {10100000} for the inverse affine transformation. 3.The apparatus as claimed in claim 2, wherein the apparatus comprises aplurality of instances of a data processing module arranged in a dataprocessing pipeline.
 4. The apparatus as claimed in claim 2, wherein theapparatus is arranged to perform encryption or decryption in accordancewith the Rijndael Block Cipher, and wherein the data processing moduleis arranged to implement a Rijndael round.
 5. An apparatus as claimed inclaim 4, wherein the data processing module is arranged to implement theSubByte transformation of the Rijndael round using the lookup tablecomposed with the affine transformation for encryption and the inverseaffine transformation for decryption.
 6. The apparatus as claimed inclaim 5, wherein said look-up table is implemented by means of a readonly memory (ROM).
 7. The apparatus as claimed in claim 2, wherein, fora given input vector having a number of bits, the shared logic isconfigured to perform an inverse affine transform responsive to one loadpattern and to perform an affine transformation responsive to anotherload pattern, the load patterns having the same number of bits as theinput vector.
 8. A apparatus for performing a SubByte function of around of the Rijndael Block Cipher, comprising an S-box constructed bycomposing, means for obtaining the multiplicative inverse in the finitefield GF(2⁸), and means for performing an affine-all transformationConsisting of an affine and inverse affine transformation as a singleaffine transformation, wherein the means for performing implements theequations:b′ ₀=[(b ₀ ·p ₀)⊕(b ₁ ·p ₁)⊕(b ₂ ·p ₂)⊕(b ₃ ·p ₃)⊕(b ₄ ·p ₄)⊕(b ₅ ·p₅)⊕(b ₆ ·p ₆)⊕(b ₇ ·p ₇)]⊕v ₀b′ ₁=[(b ₀ ·p ₇)⊕(b ₁ ·p ₀)⊕(b ₂ ·p ₁)⊕(b ₃ ·p ₂)⊕(b ₄ ·p ₃)⊕(b ₅ ·p₄)⊕(b ₆ ·p ₅)⊕(b ₇ ·p ₆)]⊕v ₁b′ ₂=[(b ₀ ·p ₆)⊕(b ₁ ·p ₇)⊕(b ₂ ·p ₀)⊕(b ₃ ·p ₁)⊕(b ₄ ·p ₂)⊕(b ₅ ·p₃)⊕(b ₆ ·p ₄)⊕(b ₇ ·p ₅)]⊕v ₂b′ ₃=[(b ₀ ·p ₅)⊕(b ₁ ·p ₆)⊕(b ₂ ·p ₇)⊕(b ₃ ·p ₀)⊕(b ₄ ·p ₁)⊕(b ₅ ·p₂)⊕(b ₆ ·p ₃)⊕(b ₇ ·p ₄)]⊕v ₃b′ ₄=[(b ₀ ·p ₄)⊕(b ₁ ·p ₅)⊕(b ₂ ·p ₆)⊕(b ₃ ·p ₇)⊕(b ₄ ·p ₀)⊕(b ₅ ·p₁)⊕(b ₆ ·p ₂)⊕(b ₇ ·p ₃)]⊕v ₄b′ ₅=[(b ₀ ·p ₃)⊕(b ₁ ·p ₄)⊕(b ₂ ·p ₅)⊕(b ₃ ·p ₆)⊕(b ₄ ·p ₇)⊕(b ₅ ·p₀)⊕(b ₆ ·p ₁)⊕(b ₇ ·p ₂)]⊕v ₅b′ ₆=[(b ₀ ·p ₂)⊕(b ₁ ·p ₃)⊕(b ₂ ·p ₄)⊕(b ₃ ·p ₅)⊕(b ₄ ·p ₆)⊕(b ₅ ·p₇)⊕(b ₆ ·p ₀)⊕(b ₇ ·p ₁)]⊕v ₆b′ ₇=[(b ₀ ·p ₁)⊕(b ₁ ·p ₂)⊕(b ₂ ·p ₃)⊕(b ₃ ·p ₄)⊕(b ₄ ·p ₅)⊕(b ₅ ·p₆)⊕(b ₆ ·p ₇)⊕(b ₇ ·p ₀)]⊕v ₇ having p=p₀p₁p₂p₃p₄p₅p₆p₇ as a loadpattern consisting of {1000111} for the affine transformation and{0010010} for the inverse affine transformation and having v as a loadvector=v₀v₁v₂v₃v₄v₅v₆v₇ consisting of {11000110} for the affinetransformation and {10100000} for the inverse affine transformation. 9.The apparatus as claimed in claim 8, wherein said means for obtainingthe multiplicative inverse is a look-up table, and said means forperforming the affine-all transformation is a combinational logiccircuit.
 10. A method for performing a SubByte function of a Rijndaelround of the Rijndael Block Cipher, comprising the steps of creating alook-up table for the multiplicative inverse in the finite field GF(2⁸);providing an affine-all transformation consisting of an affine andinverse affine transformation in a single affine transformation, usingthe equations:b′ ₀=[(b ₀ ·p ₀)⊕(b ₁ ·p ₁)⊕(b ₂ ·p ₂)⊕(b ₃ ·p ₃)⊕(b ₄ ·p ₄)⊕(b ₅ ·p₅)⊕(b ₆ ·p ₆)⊕(b ₇ ·p ₇)]⊕v ₀b′ ₁=[(b ₀ ·p ₇)⊕(b ₁ ·p ₀)⊕(b ₂ ·p ₁)⊕(b ₃ ·p ₂)⊕(b ₄ ·p ₃)⊕(b ₅ ·p₄)⊕(b ₆ ·p ₅)⊕(b ₇ ·p ₆)]⊕v ₁b′ ₂=[(b ₀ ·p ₆)⊕(b ₁ ·p ₇)⊕(b ₂ ·p ₀)⊕(b ₃ ·p ₁)⊕(b ₄ ·p ₂)⊕(b ₅ ·p₃)⊕(b ₆ ·p ₄)⊕(b ₇ ·p ₅)]⊕v ₂b′ ₃=[(b ₀ ·p ₅)⊕(b ₁ ·p ₆)⊕(b ₂ ·p ₇)⊕(b ₃ ·p ₀)⊕(b ₄ ·p ₁)⊕(b ₅ ·p₂)⊕(b ₆ ·p ₃)⊕(b ₇ ·p ₄)]⊕v ₃b′ ₄=[(b ₀ ·p ₄)⊕(b ₁ ·p ₅)⊕(b ₂ ·p ₆)⊕(b ₃ ·p ₇)⊕(b ₄ ·p ₀)⊕(b ₅ ·p₁)⊕(b ₆ ·p ₂)⊕(b ₇ ·p ₃)]⊕v ₄b′ ₅=[(b ₀ ·p ₃)⊕(b ₁ ·p ₄)⊕(b ₂ ·p ₅)⊕(b ₃ ·p ₆)⊕(b ₄ ·p ₇)⊕(b ₅ ·p₀)⊕(b ₆ ·p ₁)⊕(b ₇ ·p ₂)]⊕v ₅b′ ₆=[(b ₀ ·p ₂)⊕(b ₁ ·p ₃)⊕(b ₂ ·p ₄)⊕(b ₃ ·p ₅)⊕(b ₄ ·p ₆)⊕(b ₅ ·p₇)⊕(b ₆ ·p ₀)⊕(b ₇ ·p ₁)]⊕v ₆b′ ₇=[(b ₀ ·p ₁)⊕(b ₁ ·p ₂)⊕(b ₂ ·p ₃)⊕(b ₃ ·p ₄)⊕(b ₄ ·p ₅)⊕(b ₅ ·p₆)⊕(b ₆ ·p ₇)⊕(b ₇ ·p ₀)]⊕v ₇ having p=p₀p₁p₂p₃p₄p₅p₆p₇ as a loadpattern consisting of {10001111} for the affine transformation and{00100101} for the inverse affine transformation and having v as a loadvector=v₀v₁v₂v₃v₄v₅v₆v₇ consisting of {11000110} for the affinetransformation and {10100000} for the inverse affine transformation;composing an S-box constructed of the look-up table and the affine-alltransformation; and performing a non-linear byte substitution using thecomposed S-box.
 11. The method of claim 10, wherein the providing stepfurther comprises the step of providing a shared logic circuit thatperforms the single affine transformation.
 12. The method of claim 10,further comprising the step of storing the look-up table in a read-onlymemory (ROM).
 13. The method of claim 12, wherein the providing stepfurther comprises the step of implementing a shared logic circuit thatperforms the single affine transformation.
 14. The method of claim 10,wherein: the look-up table is the multiplicative inverse in the finitefield GF(2⁸) having {00} mapped to itself; and the providing stepfurther comprises the step of implementing a combinational logic circuitthat performs the single affine transformation.